Recap from last week



Administrative: First midterm

  1. First midterm will be on March 4th (Tuesday) during class time.

  2. Open ‘book’ exam: open notes. But you need to work on it by yourself only.

  3. Exam will be published and go live on Canvas starting at 5:30 pm.

  4. Exam deliverables (i.e., .qmd file, rendered HTML, csv file I may ask you to create) are due by 8:30 pm on Canvas on March 4th.

  5. I will accept late submission up to 11:59pm of March 4th but your final exam score is capped at 85%.

  6. You may take it anywhere.

  7. Although I will not be there, our classroom is available for your use.

  8. Exam will cover lecture materials, in-class exercises, and homework problems from Lectures 1-6 only.



Introduction to Map Making and Census Data

Introduction to Map Making and Census Data I (today) will focus on background information on how geographic data is represented here in the United States, US demographic data using Census data, and static-map map making using R.

Introduction to Map Making and Census Data II (next week) will focus on interactive map making using R.


Note: Subject to change based on how much material we can cover today.



Discuss US-Based Geospatial and Demographic Data: Goals Today



Before we continue

Let us pause for a minute or two before we continue with the workshop.

Go to: https://api.census.gov/data/key_signup.html

We will need the API key for censusapi and tidycensus packages.



Geographic Identifiers (GEOIDs): The Basics.

Geographic identifiers (or GEOIDs) are numeric codes that uniquely identify all administrative/legal and statistical geographic areas.

Here in the US, we primarily use what are called Federal Information Processing Series (FIPS) codes.



Geographic hierarchies

Figure 1. Geographic hierarchies in the United States. Typically, the key notable geographic levels scientists and policy makers concern themselves with are: (1) State, (2) County, (3) Census Tract, (4) Census Block, and (5) Zip Code Tabulation Areas (ZCTAs).

Figure 1. Geographic hierarchies in the United States. Typically, the key notable geographic levels scientists and policy makers concern themselves with are: (1) State, (2) County, (3) Census Tract, (4) Census Block, and (5) Zip Code Tabulation Areas (ZCTAs).



Geographic hierarchies

Figure 2. Geographic hierarchies in the United States. Visual representation of how counties, census tracts, block groups, and blocks are nested within one another.

Figure 2. Geographic hierarchies in the United States. Visual representation of how counties, census tracts, block groups, and blocks are nested within one another.



Federal Information Processing Standards (FIPS)

Figure 3. FIPS standards across geographic hierarchies. Again, GEOID/FIPS codes are typically what we use to identify both the geographic level and specific location we are working with. FIPS and GEOID are often used synonymously with one another.

Figure 3. FIPS standards across geographic hierarchies. Again, GEOID/FIPS codes are typically what we use to identify both the geographic level and specific location we are working with. FIPS and GEOID are often used synonymously with one another.



States FIPS

You can get this from many websites. This specific list is from the Census Bureau directly.




A Quick Detour: Non-Census Data at County and Tract Level

Federal agencies and researchers are increasingly using the CDC/ATSDR Social Vulnerability Index (SVI).

From CDC: “Natural disasters and infectious disease outbreaks can pose a threat to a community’s health. Socially vulnerable populations are especially at risk during public health emergencies because of factors like socioeconomic status, household composition, minority status, or housing type and transportation.”

SVI Availability: Data are available at county- and tract- level.

Index Range: The index (labelled RPL_THEMES in the dataset) is a score between 0 (least vulnerable) and 1 (most vulnerable).

Note: Some CDC data sets are currently down (permanently or temporarily) as federal agencies are assessing their compliance with President Trump’s Executive Orders. The site for CDC’s SVI dataset is currently down.



Social Vulnerability Index Scoring Breakdown

Figure 4. Factors that impact the index for the 2022 SVI. The latest SVI data is for 2022. The dataset calculates the SVI using Census ACS data (more on this later) for 2018-2022.

Figure 4. Factors that impact the index for the 2022 SVI. The latest SVI data is for 2022. The dataset calculates the SVI using Census ACS data (more on this later) for 2018-2022.


Partial SVI Data at County-Level:


Partial SVI Data at Census Tract-Level:


Coordinates: Latitude vs Longitude

The numbers are in decimal degrees format and range from -90 to 90 for latitude and -180 to 180 for longitude.

Latitude and Longitude of the EarthIllustration of geographic latitude and longitude of the earth.South Pole南極ദക്ഷിണധ്രുവംالقطب الجنوبيSouth PoleNorth Pole北極ഉത്തരധ്രുവംالقطب الشماليNorth PoleEquator赤道ഭൂമധ്യരേഖخط الأستواءEquatorPrime Meridian本初子午線ഗ്രീനിച്ച് രേഖخط الطول الأولPrime MeridianLatitude緯度അക്ഷാംശംخط العرضLatitudeLongitude緯度രേഖാംശംخط الطولLongitude90°90°60°60°30°30°-30°-30°-60°-60°-90°-90°30°30°60°60°90°90°120°120°150°150°180°180°-150°-150°-120°-120°-90°-90°-60°-60°-30°-30°

Figure 5. Image showing longitude (x-axis) and latitude (y-axis).


Free geocoding tools

You likely know which county you are in. However, you likely do not know the specific census tract or block group you are in.

Luckily, Census has free geocoding tools (e.g., web interface, API) that can help us!

Web interface link is: https://geocoding.geo.census.gov/geocoder/geographies/address?form

Let us do a quick live demo of this tool.


Free geocoding tools in R: forward geocoding

You can use the tidygeocoder package in R.

The code below is an example of forward geocoding (addresses ⮕ coordinates). By default, geocode() uses Nominatim (a geocoding software package) to perform the task.

# run if not installed before: install.packages("tidygeocoder")
library(tidygeocoder)
addresses_df <- data.frame(address = c("60 College St, New Haven, CT"))                              

geocode(addresses_df, address = address)
## Passing 1 address to the Nominatim single address geocoder
## Query completed in: 1 seconds
## # A tibble: 1 × 3
##   address                        lat  long
##   <chr>                        <dbl> <dbl>
## 1 60 College St, New Haven, CT  41.3 -72.9



Free geocoding tools in R: forward geocoding

It can process multiple addresses:

# run if not installed before: install.packages("tidygeocoder")
library(tidygeocoder)
addresses_df_v2 <- data.frame(address = c("1600 Pennsylvania Avenue, Washington, DC",
                                       "1313 Disneyland Dr, Anaheim, CA"))                              

geocode(addresses_df_v2, address = address)
## Passing 2 addresses to the Nominatim single address geocoder
## Query completed in: 2 seconds
## # A tibble: 2 × 3
##   address                                    lat   long
##   <chr>                                    <dbl>  <dbl>
## 1 1600 Pennsylvania Avenue, Washington, DC  38.9  -77.0
## 2 1313 Disneyland Dr, Anaheim, CA           33.8 -118.



Free geocoding tools in R: forward geocoding

If you are interested in determining the geographies (e.g., county, tract information), we need to use the census method which leverages the Census API. For full output, see next slide.

# run if not installed before: install.packages("tidygeocoder")
library(tidygeocoder)
addresses_df_v3 <- data.frame(address = c("26 Plympton St, Cambridge, MA"))                              

results_df <- geocode(addresses_df_v3, address = address, method = "census", 
        full_results = TRUE, api_options = list(census_return_type = 'geographies'))



Some extractable information from results_df

# Script you need to run to see county information
results_df$geographies.Counties
## [[1]]
##   GEOID     CENTLAT AREAWATER STATE  BASENAME            OID LSADC FUNCSTAT
## 1 25017 +42.4853699  75225346    25 Middlesex 27590260056194    06        N
##      INTPTLAT             NAME OBJECTID      CENTLON COUNTYCC COUNTYNS
## 1 +42.4817215 Middlesex County     2089 -071.3915833       H4 00606935
##     AREALAND     INTPTLON MTFCC COUNTY
## 1 2118357703 -071.3949160 G4020    017
# Script you need to run to see county information
results_df$`geographies.Census Tracts`
## [[1]]
##         GEOID     CENTLAT AREAWATER STATE BASENAME            OID LSADC
## 1 25017353700 +42.3745268         0    25     3537 20790260102279    CT
##   FUNCSTAT    INTPTLAT              NAME OBJECTID  TRACT      CENTLON AREALAND
## 1        S +42.3745268 Census Tract 3537    17676 353700 -071.1122878   532087
##       INTPTLON MTFCC COUNTY
## 1 -071.1122878 G5020    017



What other geographic details are available?

Pay close attention to column names with a geographies prefix.

colnames(results_df)[7:17]
##  [1] "geographies.States"                                  
##  [2] "geographies.Combined Statistical Areas"              
##  [3] "geographies.County Subdivisions"                     
##  [4] "geographies.Urban Areas"                             
##  [5] "geographies.Incorporated Places"                     
##  [6] "geographies.Counties"                                
##  [7] "geographies.2024 State Legislative Districts - Upper"
##  [8] "geographies.2024 State Legislative Districts - Lower"
##  [9] "geographies.2020 Census Blocks"                      
## [10] "geographies.Census Tracts"                           
## [11] "geographies.119th Congressional Districts"



Free geocoding tools in R: reverse geocoding

You can use the tidygeocoder package in R.

The code below is an example of reverse geocoding (coordinates ⮕ addresses).

reverse_geo(lat = "41.30374", long = "-72.93216")
## Passing 1 coordinate to the Nominatim single coordinate geocoder
## Query completed in: 1 seconds
## # A tibble: 1 × 3
##     lat  long address                                                           
##   <dbl> <dbl> <chr>                                                             
## 1  41.3 -72.9 Laboratory of Epidemiology and Public Health, 60, College Street,…



Three minute practice

Do Practice 1 under the Quarto Notebook called Lecture 5 Scripts and Practice from Canvas.



List of Census Surveys and Datasets

Figure 6. Screenshot of page showing all the surveys performed by the US Census Bureau.

Figure 6. Screenshot of page showing all the surveys performed by the US Census Bureau.



Quick overview of Small Area Health Insurance Estimates (SAHIE)

You may access the large yearly SAHIE datasets by clicking this.

Recommended: Accessing the data using the SAHIE interactive tool to minimize data cleaning. The tool can be accessed by clicking this. This greatly minimizes data cleaning/subsetting tasks.

Figure 7. Screenshot of the SAHIE dashboard.

Figure 7. Screenshot of the SAHIE dashboard.



SAHIE interactive tool

Let us quick a live demo on downloading county-level SAHIE data for Alabama.

Our goal is to download and clean data to be ready for reading into R.



The American Community Survey (ACS) Data

The American Community Survey (ACS) is an ongoing yearly survey.

Arguably the most widely used data set from Census Bureau. Census Bureau: “[ACS] is the premier source for detailed population and housing information about our nation.”

Two yearly versions: ACS 1-year and ACS 5-year.

Note: For older data (2007-2013), 3 year estimates exist.

Language: If someone says they’re using the 5-year 2020 ACS data, they’re referring to the 2016-2020 5-year ACS data.



The American Community Survey (ACS) Data

Many ACS-related documentation online.

What to look for and remember: Table Shells. Table shells (particularly for detailed tables) is a comprehensive list of variable documentation for ACS data.

Let us download the ACS 2022 Table Shells and go over it together. Let us search for disability-related variables.



The American Community Survey (ACS) Table Shells

Figure 8. Excel Table Shell.

Figure 8. Excel Table Shell.

Figure 9. API Table Shell.

Figure 9. API Table Shell.

UniqueID from the Excel Table Shell (or Name in the API Table Shell) represents the variable(s) we want to get from the Census ACS data. Technically, Name in the API Table Shell is more accurate since it has the suffix “-E” which represents “estimates”.

Do note that if you’re using the Excel Table Shell, you’ll eventually need to add the suffix “-E” when requesting data from the Census API.



censusapi package: pulling data into R.

Suppose we are interesting in county-level population data. According to the ACS 2022 table shell, the variable we need to use is: B01001_001E.

We can use the censusapi package to get census data directly into R.

# run this if not installed: install.packages("censusapi")
library(censusapi)
Sys.setenv(CENSUS_KEY="") # put your API key here
population_data <- getCensus(
  name = "acs/acs5", # requests ACS5 data
  vintage = 2022, # requests 2022 data
  vars = c("B01001_001E"), #requested variable
  region = "county:*") #requested geography
head(population_data)
##   state county B01001_001E
## 1    01    001       58761
## 2    01    003      233420
## 3    01    005       24877
## 4    01    007       22251
## 5    01    009       59077
## 6    01    011       10328



Examples of Other Important censusapi calls

An example of asking for multiple variables:

data1 <- getCensus(name = "acs/acs5", vintage = 2019, 
                   vars = c("B28002_001E", "B28002_002E"), 
                   region = "county:*")

An example of asking for Connecticut-only county-level data (Note: CT’s FIPS code is 09).

data2 <- getCensus(name = "acs/acs5", vintage = 2019, 
                   vars = c("B28002_001E", "B28002_002E"), 
                  region = "county:*", regionin = "state:09")

An example of asking for Missouri-only tract-level data (Note: MO’s FIPS code is 29).

data3 <- getCensus(name = "acs/acs5", vintage = 2019, 
                   vars = c("B28002_001E", "B28002_002E"), 
                   region = "tract:*", regionin = "state:29")



Quick Detour: Merging data

Merging data from another data frame (table) into another data frame (table) using left_join(). Suppose superheroes and publishers are two data frames:

Figure 10. Visualizing left_join(). Excerpt of resources I created last semester for my DATA 412/612 course. I like to think of left_join as supplementing a table with data from another table. Note: This works because both tables have a variable named 'publisher' (this required common variable is called a join key).

Figure 10. Visualizing left_join(). Excerpt of resources I created last semester for my DATA 412/612 course. I like to think of left_join as supplementing a table with data from another table. Note: This works because both tables have a variable named ‘publisher’ (this required common variable is called a join key).


The typical recipe: Making Maps the Simple Way

  1. We need a shapefile file which is a digital format for storing geographic location and associated attribute information (e.g., points, lines, or polygons). Think of boundaries, shapes, geometric information to delineate locations and boundaries. We will use the tigris package to get the shapefiles directly from the US Census Bureau, so we don’t have to work and manually load actual shapefiles (.shp) into R.

  2. We need data to map we are interested in.

  3. Merge data (e.g., SVI data) to the shapefile (which is also a data frame).

  4. Leverage ggplot2 package to render the map.


For this example: We will map the SVI data from the CDC for Missouri. You may download the relevant file from my Github repo (link in two slides). Partial data is printed on next page to study the data together.

Before we proceed: Run install.packages("tigris") via your R console if you don’t have it installed.



Partial MO’s 2019 Tract-level SVI Data



Creating SVI Map for MO

Step 1: Retrieve shapefile needed.

library(tigris) 
mo_shape_file <- tracts(state = "MO", year = 2019)

# not required but nice to visualize data
head(mo_shape_file)
## Simple feature collection with 6 features and 12 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -93.32762 ymin: 37.06259 xmax: -91.09394 ymax: 38.37489
## Geodetic CRS:  NAD83
##   STATEFP COUNTYFP TRACTCE       GEOID    NAME             NAMELSAD MTFCC
## 1      29      055  450302 29055450302 4503.02 Census Tract 4503.02 G5020
## 2      29      055  450102 29055450102 4501.02 Census Tract 4501.02 G5020
## 3      29      055  450200 29055450200    4502    Census Tract 4502 G5020
## 4      29      055  450400 29055450400    4504    Census Tract 4504 G5020
## 5      29      015  460400 29015460400    4604    Census Tract 4604 G5020
## 6      29      229  490300 29229490300    4903    Census Tract 4903 G5020
##   FUNCSTAT     ALAND   AWATER    INTPTLAT     INTPTLON
## 1        S  59019556    54839 +38.0699995 -091.3834407
## 2        S 215515312   158937 +38.1505661 -091.1929142
## 3        S 785265618   714683 +37.9120761 -091.2086380
## 4        S 518540939   475755 +37.8958096 -091.3892205
## 5        S 216350354 11553444 +38.3016635 -093.1718555
## 6        S 335405942   257629 +37.1176895 -092.5083341
##                         geometry
## 1 POLYGON ((-91.42897 38.0501...
## 2 POLYGON ((-91.31192 38.1507...
## 3 POLYGON ((-91.3684 38.09352...
## 4 POLYGON ((-91.52872 37.7942...
## 5 POLYGON ((-93.32762 38.2696...
## 6 POLYGON ((-92.68554 37.0748...

Step 2: Load MO SVI data into R (finalize data we want to map)

library(tidyverse) 
mo_svi_data <- read_csv("https://raw.githubusercontent.com/jmtfeliciano/teachingdata/refs/heads/main/MissouriSVI2019.csv") |>
  mutate(GEOID = as.character(FIPS)) # rename FIPS into GEOID

Step 3: Merge SVI data into shapefile then create new shapefile.

mo_shape_file_v2 <- left_join(mo_shape_file, mo_svi_data) 

Step 4: Plot map (Note: RPL_THEMES is the SVI variable).

ggplot(data = mo_shape_file_v2, mapping = aes(fill = RPL_THEMES)) +
  geom_sf()



Generating Map


ggplot(data = mo_shape_file_v2, aes(fill = RPL_THEMES)) +
  geom_sf()





Further Customizations


# Added theme_void() 
# to remove grid and grey background
ggplot(data = mo_shape_file_v2, aes(fill = RPL_THEMES)) +
  geom_sf() +
  theme_void()





Further Customizations Part 2


# Further customizes labels and color gradient
ggplot(data = mo_shape_file_v2, aes(fill = RPL_THEMES)) +
  geom_sf() +
  theme_void() +
  scale_fill_gradient(low="#1fa187", 
                      high="#440154") +
  labs(fill='MO-Specific SVI')



Note: “#1fa187” and “#440154” above are what are called hexadecimal representation of colors. An excellent detailed guide on colors in R can be found by clicking this resource from UCSB.



tigris package shapefiles

In the previous example, we used tracts(state = “MO”) to get the tract-specific shapefile for MO.

Many other shape files are available. Two key examples:

For state-level map: states().

For county-level map: counties().

To the best of my knowledge, there are 40-50 shapefiles available (e.g. AIANNH [American Indian, Alaska Native and Native Hawaiian] boundaries, zip code tabulation area (ZCTA) boundaries).

Speaking of ZCTA, a brief comment on ZCTA.



Detour: Zip Code Tabulation Area (ZCTA)

ACS datasets are also available at the ZCTA-level.

This might sound like the zip codes we use in our addresses. But they are not the same.

Most of the time, your postal zip code is the same as the ZCTA.

Zip codes primarily used by the Postal Service for P.O. boxes will likely belong to a different ZCTA. The same is true for areas with few residential addresses (or areas that are primarily occuppied by commercial businesses).

There are available crosswalks out there that can help you convert between postal zip codes and ZCTA (e.g., crosswalk from censusreporter).

Please talk to a geographer or demographer before doing any comprehensive work with zip codes or ZCTA.



Practive #2: with county-level data.

Let us switch to the quarto file again. We will work on this as a class together for the next few minutes.



tidycensus package and census data.

tidycensus is an R package that allows users to interface with a select number of the US Census Bureau’s data APIs and return data frames.

If you want to map ACS-related data, the tidycensus package is the most convenient way to go. One of the advantages of using tidycensus is it has the option to return not just the requested variable(s) but also the corresponding shapefile needed. If your goal is to visualize Census data via a map, tidycensus is the package to use.

Before going further, load the tidycensus package:

# run install.packages("tidycensus") if not installed
library(tidycensus)



tidycensus package and census data.

The script below uses load_variables() to list the available variables within the 2023 ACS5 data–this is a table shell but loaded into R as a data frame. Remember, when someone refers to ‘2023 ACS 5 data’, the estimates actually use data for 2019-2023).

variable_list_2022 <- load_variables(2022, "acs5", cache = TRUE)
nrow(variable_list_2022)
## [1] 28152



tidycensus package and census data

Looks familiar?

head(variable_list_2022)
## # A tibble: 6 × 4
##   name        label                                   concept          geography
##   <chr>       <chr>                                   <chr>            <chr>    
## 1 B01001A_001 Estimate!!Total:                        Sex by Age (Whi… tract    
## 2 B01001A_002 Estimate!!Total:!!Male:                 Sex by Age (Whi… tract    
## 3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years  Sex by Age (Whi… tract    
## 4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years   Sex by Age (Whi… tract    
## 5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years Sex by Age (Whi… tract    
## 6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years Sex by Age (Whi… tract



tidycensus package and census data

Advanced recipe: using basic text mining skills in R to find tables related to medicare.

variable_list_2022 |>
  filter(str_detect(concept, regex("medicare", ignore_case = TRUE))) |>
  relocate(concept) # relocate() moves concept into the first column
## # A tibble: 24 × 4
##    concept                         name        label                   geography
##    <chr>                           <chr>       <chr>                   <chr>    
##  1 Allocation of Medicare Coverage B992706_001 Estimate!!Total:        tract    
##  2 Allocation of Medicare Coverage B992706_002 Estimate!!Total:!!Allo… tract    
##  3 Allocation of Medicare Coverage B992706_003 Estimate!!Total:!!Not … tract    
##  4 Medicare Coverage by Sex by Age C27006_001  Estimate!!Total:        tract    
##  5 Medicare Coverage by Sex by Age C27006_002  Estimate!!Total:!!Male: tract    
##  6 Medicare Coverage by Sex by Age C27006_003  Estimate!!Total:!!Male… tract    
##  7 Medicare Coverage by Sex by Age C27006_004  Estimate!!Total:!!Male… tract    
##  8 Medicare Coverage by Sex by Age C27006_005  Estimate!!Total:!!Male… tract    
##  9 Medicare Coverage by Sex by Age C27006_006  Estimate!!Total:!!Male… tract    
## 10 Medicare Coverage by Sex by Age C27006_007  Estimate!!Total:!!Male… tract    
## # ℹ 14 more rows



tidycensus package

Task: Suppose we want to map the median % of household income spent on rent for each state using variable B25071_001.

What to run for county-level ACS5 data:

.tiny[

library(tidycensus)
census_api_key("YOUR CENSUS API KEY HERE")
shapefile_with_data <- get_acs(
  geography = "state",
  variables = "B25071_001",
  year = 2019,
  survey = "acs5",
  geometry = TRUE,
  shift_geo = TRUE
)

The key part here is: Make sure geometry = TRUE as the default is FALSE. By setting geometry as TRUE, you are instructing get_acs() to return the final data as an SF object (shapefile) that is ready for map rendering via ggplot2. shift_geo = TRUE is also important as it will compress the distance between the contiguous United States with Alaska, Hawaii, and Puerto Rico.

NOTE: when we looked at the table shells, we added ‘E’ at the end of the variable name when using the censusapi package. For tidycensus, it is not required.



Rendering the map

ggplot(data = shapefile_with_data,
       aes(fill = estimate)) + 
  geom_sf() +
  theme_void() +
  labs(fill='Median Gross Rent as a % of Household Income') +
  scale_fill_gradient(low="#1fa187", 
                      high="#440154") +
  theme(legend.position="bottom")



Note: “#1fa187” and “#440154” above are what are called hexadecimal representation of colors. An excellent detailed guide on colors in R can be found by clicking this resource from UCSB.



Importance of shifting geometry

I mentioned earlier that shift_geo = TRUE is important.

Here’s the map you’d generate without setting that argument as TRUE.



Rendering the map: Example 2 (Full Template)

library(tidycensus)
library(tidyverse)
census_api_key("YOUR CENSUS API KEY HERE")

ct_shapefile_with_data <- get_acs(
  geography = "county",
  state = "CT",
  variables = "B25071_001",
  year = 2019,
  survey = "acs5",
  geometry = TRUE 
  # shift_geo is not needed if you're not mapping entire US
) 

ggplot(data = ct_shapefile_with_data,
       aes(fill = estimate)) + 
  geom_sf() +
  theme_void() +
  labs(fill='Median Gross Rent as a % of Household Income') +
  scale_fill_gradient(low="white", 
                      high="black") 

See next slide for the rendered map.



Rendering the map: Example 2



Impending syntax change for tidycensus: nationwide map

Current syntax:

shapefile_with_data <- get_acs(
  geography = "state",
  variables = "B25071_001E",
  year = 2019,
  survey = "acs5",
  geometry = TRUE,
  shift_geo = TRUE
)

Future release syntax:

shapefile_with_data <- get_acs(
  geography = "state",
  variables = "B25071_001E",
  year = 2019,
  survey = "acs5",
  geometry = TRUE
) |>
  shift_geometry()



Other functions from tidycensus

get_estimates() can give you detailed information about population characteristics. In your own time, try changing the value of product to the following: “components”, “population”, or characteristics”. .tiny[

get_estimates(geography = "state", product = "components", vintage = 2023)
## Using the Vintage 2023 Population Estimates
## # A tibble: 676 × 5
##    GEOID NAME    variable          year     value
##    <chr> <chr>   <chr>            <int>     <dbl>
##  1 01    Alabama BIRTHS            2023 58251    
##  2 01    Alabama DEATHS            2023 59813    
##  3 01    Alabama NATURALCHG        2023 -1562    
##  4 01    Alabama INTERNATIONALMIG  2023  5384    
##  5 01    Alabama DOMESTICMIG       2023 30744    
##  6 01    Alabama NETMIG            2023 36128    
##  7 01    Alabama RESIDUAL          2023    -1    
##  8 01    Alabama RBIRTH            2023    11.4  
##  9 01    Alabama RDEATH            2023    11.7  
## 10 01    Alabama RNATURALCHG       2023    -0.307
## # ℹ 666 more rows



Other functions from tidycensus

get_flows() provides detailed migration flow data (if available).

get_flows(
  geography = "county",
  state = "NY",
  county = "New York",
  year = 2019
)
## # A tibble: 2,019 × 7
##    GEOID1 GEOID2 FULL1_NAME                FULL2_NAME    variable estimate   moe
##    <chr>  <chr>  <chr>                     <chr>         <chr>       <dbl> <dbl>
##  1 36061  <NA>   New York County, New York Africa        MOVEDIN       468   182
##  2 36061  <NA>   New York County, New York Africa        MOVEDOUT       NA    NA
##  3 36061  <NA>   New York County, New York Africa        MOVEDNET       NA    NA
##  4 36061  <NA>   New York County, New York Asia          MOVEDIN      9911  1039
##  5 36061  <NA>   New York County, New York Asia          MOVEDOUT       NA    NA
##  6 36061  <NA>   New York County, New York Asia          MOVEDNET       NA    NA
##  7 36061  <NA>   New York County, New York Central Amer… MOVEDIN      1553   857
##  8 36061  <NA>   New York County, New York Central Amer… MOVEDOUT       NA    NA
##  9 36061  <NA>   New York County, New York Central Amer… MOVEDNET       NA    NA
## 10 36061  <NA>   New York County, New York Caribbean     MOVEDIN      2783   712
## # ℹ 2,009 more rows



Post-class exercise ideas:

Use tidycensus to create other maps.

Use the tigris package with outside data you want to create a map for, and create a map of your own choosing!